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BACKGROUND OF THE INVENTION: 

FIELD OF THE INVENTION: 

The present invention generally relates to a 
utility service that automates management o£ system 
5 health-operations in monitoring, prediction, and 
notification, cmd provides the ability to correct system 
health issues in a digital environment. 

DESCRIPTION OF RELATED ART: 

Monitoring the health o£ complex multi- 

10 processor systems is a di££icult and time-consuming task. 
Human operators must know where and how frequently to 
check £or problem conditions and how to react to correct 
them when £ound. Recognizing the signs o£ a possible 
future problem so that it can be avoided altogether is 

15 even more difficult and is not a task that is performed 
with any consistency across customer sites. Earlier 
releases of the Unisys Server Sentinel software suite 
attempted to address these issues for Unisys ES7000 
systems through a set of approximately 20 knowledge 

20 scripts provided with the Server Director product. 
Although these scripts provided automated monitoring for 
predetermined alert conditions, each script had to be 
separately configured and deployed by technical staff at 
each customer site. The conditions being monitored were 

25 g nerally only things that could be expressed very simply 
(such as simple threshold violations) , and the script set 
provided little in the way of predictiv monitoring. 
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On related art method to which th method of 
the present invention generally relates is described in 
T3.S. Patent No. 4,881,230 entitled "^Expert System For 
Processing Errors In A Multiplex Communications System". 
5 This prior related art method is a method and apparatus 
for detecting and analyzing errors in a communications 
system. The method employs expert system techniques to 
isolate failures to specific field replaceable units and 
provide detailed messages to guide an operator to a 

10 solution. The expert system techniques include detailed 
decision trees designed for each resource in the system. 
The decision trees also filter extraneous sources of 
errors from affecting the error analysis results. 

The present invention differs from the eibove 

15 related cited art in that the prior invention deals 
specifically with a "communications system", not a 
general -purpose computer system. The cited prior 

reference targets actual failures of field replaceable 
hardware units, whereas the present invention will detect 

20 warning conditions that predict failure (as well as 
failures that have already occurred) and is capable of 
monitoring software as well as hardware. 

Yet another related art method to which the 
method of the present invention generally relates is 

25 described in U.S. Patent No. 6,263,452 entitled ^^Fault- 
tolerant Computer System With Online Recovery And 
Reintegration Of Redxindant Components''. This prior 

related art method involves a computer system in a fault- 
tolerant configuration which employs multiple identical 

30 CPUs executing the sam instruction stream, with 
multiple, id ntical memory modules in th address space 
of the _CPUs storing duplicat s of the same data. The 
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system d tects £aults in the CPUs and memory modules, and 
places a faulty unit offline while continuing to operate 
using the good units. The faulty unit can be replaced and 
reintegrated into the system without shutdown. The 
5 multiple CPUs are loosely synchronized, as by detecting 
events such as memory references and stalling any CPU 
ahead of others until all execute the function 
simultcmeously; interrupts can be synchronized by 
ensuring that all CPUs implement the interrupt at the 

10 same point in their instruction stream. Memory references 
via the separate CPU- to-memory busses are voted at the 
three separate ports of each of the memory modules. I/O 
functions are implemented using two identical I/O busses, 
each of which is separately coupled to only one of the 

15 memory modules. A number of I/O processors are coupled to 
both I/O busses. I/O devices are accessed through a pair 
of identical (redundant) I/O processors, but only one is 
designated to actively control a given device; in case of 
failure of one I/O processor, however, an I/O device can 

20 be accessed by the other one without system shutdown. 

The present invention differs from this related 
art in that the cited prior art focuses on a method that 
deals with fault- tolerant configuration of redundant 
CPUs. The method of the present invention is not limited 

25 to hardware and is concerned with reporting hardware and 
software problems rather than automatically swapping out 
bad hardware components. 

Another related art method to which the method 
of the present invention generally relates is described 

30 in U.S. Patent No. 6,237,114 entitl d ^^System And Method 
For Evaluating Monitored Computer Systems'' * This prior 
r lated art method is a computer system used in 
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monitoring another computer system and provides both 
textual resolution information describing a likely 
solution for a problem encountered in the monitored 
computer system as well as component information that 
5 relates to the particular problem. The component 
information includes the various hardware, software and 
operating conditions found in the monitored computer 
system. The monitoring computer system determines if a 
condition of a predetermined severity exists in the 

10 monitored computer system according to diagnostic 
information provided from the monitored computer system. 
The diagnostic information is represented in the 
monitoring computer system as a hierarchical 
representation of the monitored computer system. The 

15 hierarchical representation provides present state 
information indicating the state of hardware and software 
components and operating conditions of the monitored 
computer system. The resolution information relating to 
the condition is retrieved from a resolution database and 

20 relevant component information is retrieved from the 
hierarchical representation of the computer system and 
presented to a support engineer to assist them in 
diagnosing the problem in the monitored computer system. 

The present invention differs from this related 

25 art in that the cited prior art focuses on a system for 
describing a problem found on a monitored system and 
advises the user of possible resolutions. The method of 
the present invention does not attempt to advise the user 
as it is more concerned with detecting and reporting 

30 failures and bad data trends that may indicat potential 
future failur s. Many of these conditions are self- 
explanatory. This cited art seems to appear like this is 
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a distributed application and th monitoring system is 
responsible for determining if a problem condition is 
present. However, in the present invention, all 

monitoring is performed locally and is tailored to use 
only a set of special monitoring policies that apply to 
the local system. 
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BRIEF SUMMARY OF THE INVENTION: 

It is therefore a method o£ the present 
invention to provide in a multiprocessor system, the 
ability to monitor and respond to a flexible set of 
5 conditions; adding and removing conditions as needed; the 
ability to providet an estimated time to failure for 
certain conditions; the ability to collect violated 
conditions for processing by a separate script or 
program. Those features provided by the service are shovm 

10 in exemplary form to operate in the Windows .NET 
environment. Features provided by the HealthEvents.dll 
(data link library) can be used from any COM (Component 
Object Model) -capable scripting or programming language 
running in a Windows NT -like environment (NT, Windows 

15 2 000, Windows XP, Windows .NET) . A Server Director 
script, as written, functions only in the Unisys Server 
Director or NetlQ AppManager environment, but can be 
modified to work in any Windows scripting environment. 

The method of the Heal thMoni tor Service loads 

20 up the HealthEvents Data Link Library so that the client 
can create a HealthEvents Object or a PredictiveEvents 
Object. This "^^Local Object'^ (the HealthEvents or 

PredictiveEvents object created in the client's local 
programming environment) is connected to a Collection of 

25 Global HealthEvents or PredictiveEvents after which there 
is a return of individual Violation Events to the client. 

At start up, the PredictiveEvents and 
HealthEvents Collections are processed against a set of 
pr -d t rmined policies (P) , which determines what to 

30 monitor, how oft n to monitor, and what action to take if 
a policy is violated. Queries ar mad to check if the 
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selected policy (P) do s monitoring o£ long-t rm trends 
which may involve a Violation Event. Likewise this is 
done £or Predictive Data also. 

Each Provider (a source o£ Health Monitoring 
5 data, such as a Windows Event Log) is checked to see i£ 
it is currently available. Then the Provider is checked 
against any policy P applicable to that Provider with the 
£inal result o£ returning in£ormation about any policy 
violation by the selected provider. Then corrective 

10 action can be instituted by the HealthMonitor service and 
HealthEvents objects via the User Client Application or 
script o£ item 712 of FIG. 7. 

Still other objects, £eatures and advantages o£ 
the present invention will become readily apparent to 

15 those skilled in the art £rom the following detailed 
description, wherein is shown and described only the 
preferred embodiment of the invention, simply by way of 
illustration of the best mode contemplated of carrying 
out the invention. As will be realized, the invention is 

20 capcJdle of other and different embodiments, and its 
several details are capable of modifications in various 
obvious respects, all without departing from the 
invention. Accordingly, the drawings and description are 
to be regarded as illustrative in nature, and not as 

25 restrictive and what is intended to be protected by 
Letters Patent is set forth in the appended claims. The 
present invention will become apparent when taken in 
conjunction with the following description and attached 
drawings, wherein like characters indicate like parts, 

30 and which drawings form a part of this application. 
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BRIEF DESCRIPTION OF THE DRAWINGS: 

FIO. 1 is a flowchart which Illustrates the 
general process o£ the Heal thMoni tor utility. 

FIG. 2 is a flowchart which illustrates the 
process for starting Heal thMoni tor policies. 

FIG. 3 is a flowchart illustrating a process 
which monitors the Heal thMoni tor policy on a separate 
thread . 

FIG. 4 illustrates a process that checks for 
Heal thMoni tor data providers. 

FIG. 5 illustrates a process that checks the 
policies of the Heal thMoni tor . 

FIG. 6 is a flowchart that illustrates the 
process flow for the HealthEvent.dll file. 

FIG. 7 shows an overall block diagram in which 
the method of present invention is used. 

FIG. 8 is a representation of the Unisys Server 
Director Operator Console, which shows the ^^Predictive 
Alerts'' icon flashed by the HealthMonitor utility in the 
leftmost (^^tree view'') pane and shows the 
Unisys_ServerAlerts euid Unisys_PredictiveAlerts scripts 
in the lowermost (^^jobs") pane. 
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GLOSSARY ITEMS: 

1. Unisys Server Director: a component of the Sentinel 

Software suite that facilitates management of the 
platform and operating environment using a drag-and- 
drop user interface to deploy any of a number of 
ready-to-use Server Director scripts. Server 
Director is based on the NetlQ AppManager product. 

2 . Server Director Scripts: scripts deployed using the 
Server Director product that monitor and report or 
monitor and react to various conditions in the 
platform or operating environment. The scripts are 
written in a variant of extended Basic for scripting 
that is recognized by the script interpreter code 
Unisys has licensed from NetlQ. 

3 . Server Sentinel : A suite of software tools sold with 
Unisys ES7000 servers to provide platform management 
capabilities • 

4. Unisys ES 7000: large-scale Unisys enterprise server 

system that supports up to 32 processors running the 
Windows operating system. 

5. Script: A set of instructions that automate the 
operation of an application or utility. 

6. Threshold Violations : conditions in which a 
monitored metric on a system exceeds a predefined 
level. For inst€Uice, if the threshold level for the 
CPU utilization metric has been set at 90%, values 
of that metric above 90% would be threshold 
violations. 
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7. Predictive Monitoring: monitoring of system metrics 

£or conditions that indicate a potential future 
problem rather than a current failure. For example, 
a decreasing trend in the cunount of available disk 
space indicates that the system may become unusable 
in the future if the trend continues. 

8. QUI (Graphical User Interface) ; system interface 

that uses pictures, graphs, or other displays in 
addition to text to convey information to the user. 

9. Unisys HealthEvents COM Objects A COM object that 

supports methods and properties describing a single 
health violation. 

10. Health Issues; problems or potential problems 
with the operation or availability of the system. 

11. Health Events: Health issues detected by the 
Unisys Heal thMoni tor service. See "^^Health Issues''. 

12. Health Violations: see ^^Health Events". 

13. Non- Predictive Alerts : alerts that pertain to 
an actual current problem rather than a predicted 
future problem. For example, an alert indicating a 
decreasing trend in the available disk space 
predicts a possible future problem; an alert that 
indicates that there is no disk space left is a non- 
predictive alert because the problem has actually 
occurred. 

14. Windows .NET Environment : processing 

environment maintained by the Windows .NET operating 
system. 
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15. XMLs extensible Markup Language that allows for 
Web automation and data interchange across multiple 
platforms and applications. 

16. XML Statements ; statements written in the XML 
language. 

17. Scriptable Interface : a set of methods and 
properties on a COM object that are exposed for 
access from scripting l€uiguages. 

18. Health (of a complex multiprocessor system) ; 
Involves the usability and availability of the 
system. 

19. Knowledge Scripts: The NetlQ name for scripts 
executed by their AppManager product. See ^^Server 
Director Scripts''. 

20 . Health Monitoring (multiprocessor system) ; 
Monitoring of system metrics to determine the 
usability and availability of the system. 

21. Ccmned XML : predefined set of statements 
written in XML. These statements cannot be altered 
by the user. 

22. Heal thEvents COM Ob j ec t ; see **Unisys 
HealthEvents COM Object" 

23. Health Events Object: see ^^Unisys HealthEvents 
COM Object'' 

24. Server Director Tree View: user interface 
display for the Server Director or AppManager 
product. 

25. Local System: system where a program has been 
started, in particular a systexn where th 
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Heal thMoni tor service and the HealthEvents dll are 
Installed and running. 

26 • Monitoring Policies : conditions describing 

metrics £or system health. For example, a monitoring 
5 policy for CPU utilization might say that: the 

system is healthy as long as CPU usage is 90% or 
less. 

27. ,dll files dynamic link library file, one of a 
collection of small programs, any of which can be 

10 called when needed by a larger program that is 

running in the computer. 

28. .NET; Windows .NET operating system. 

29. COM- capable Scripting or Programming Languages 
any programming language that understsuids the COM 

15 (Component Object Model) standard. 

30. NetlQ AppManager; product of NetlQ Corporation 
that provides system monitoring €uid problem 
reporting using a suite of scripts. 

31. Scriptable Data Collections a collection of 
20 data items that is accessible via scripting. 

32. HealthEvents dll; dynamic link library file 
containing the methods and properties supported by 
the HealthEvents and PredictiveEvents COM objects. 

33. PredictiveEvents s A COM object that supports 
25 methods amd properties describing a single 

predictive health violation. 

34. HealthEvents dll file; see ^HealthEvents dll". 

35. Policy Check Process; processing thread whose 
purpos is to determine th current valid set of 
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monitoring policies for this system. This thr ad is 
started by the main processing thread of the 
Heal thMoni tor service. 

36. Provider X : a particular source of health 
5 monitoring data. Examples of ^^providers'' are the 

Windows event^logs and Windows performance counters. 

37. Policy Ys a particular health monitoring 
policy. Examples of such policies are monitoring: 
that CPU utilization is below 90%, or that a given 

10 disk drive has at least 10% avail€Ut>le space. 

38. Server Sentinel Environment; Set of proprietary 
software processes that are installed and run on 
Unisys ES7000 servers. 

39. Component Object Model (COM) Standard: 
15 Microsoft architecture for component 

interoperability that is not dependent on any 
particular programming language, is available on 
multiple platforms, and is extensible. See the white 
paper at 
20 http : //msdn .microsoft . com/ library/ default • asp?url=/l 

ibrary/en-us/dncomg/html/msdn_comppr . asp for a 
technical overview • 

40. COM Object; A component written to the COM 
standard that provides interfaces which can be 

25 invoked by applications. 

41. Long Term Trend; A collection of data points 
that trend in the same direction (increasing or 
d cr asing) ov r tim . 

42. Current Data Point ; Value of a monitored system 
30 health indicator at this mom nt in tim . 
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43. Saved Trend Data : Collection of data points 
saved in a file over time; these data points can be 
analyzed to determine if they constitute a trend. 

44. Predictive Data; Monitored system health 
indicator that can be used to predict a potential 
future problem, for example the cunount of available 
disk space remaining. 

45. PredictiveEvents Collection; A collection of 
Predict iveEvents objects. 

46. Local Object: An instance of a COM object 
created by a client program in its local programming 
environment . 

47. Global HealthEvents : The system-wide collection 
of HealthEvents objects maintained by the 
Heal thMoni tor service. Client programs obtain a 
local copy of this global collection for use in 
their programming environment. There is also a 
global PredictiveEvents collection. 

48. HealthEvents Ob j ect ; See ^^Unisys HealthEvents 
COM Object". 

49. Unisys -Supplied Policy Files A file of 
predefined XML statements describing health 
monitoring policies that is included with the 
Heal thMoni tor service. See ^^Canned XML". 

50. Thread ; A sequence of instructions from a 
single Windows application that execute in- 
dependently of the parent process; in this case, the 
HealthMonitor service spawns off separate threads to 
monitor for policy and provider changes 
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independ ntly o£ th main proc ssing £low in tlie 
Heal thMoni tor service. 
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GENERAL OVERVIEW: 

The Unisys Heal thMoni tor service automates 
management of system health monitoring, prediction, and 
notification, and provides the ability to correct system 
5 health Issues in the Server Sentinel envirozunent • 

The exemplary Windows .NET service portion of 
the solution involved herein does the following: 

(a) executes the health monitoring policy for 
the system, which is expressed in the form of 
10 an inbuilt sequence of XML statements 

describing conditions to monitor for and 
responses to take when those conditions are 
encountered. The flexibility of XML allows 
complex monitoring conditions to be expressed. 

15 (b) runs automatically at system startup; no 

configuration or other user intervention is 
required. 

(c) monitors only the subset of conditions 
described in the canned XML that is applicable 

20 to the current system configuration; conditions 

involving items of hardware or software not 
present on the system are not activated 
unless/until those items become present, and 
items that are removed after the service 

25 commences will cease to be monitored when the 

service notices their absence. 

(d) includes a trend analysis module that uses 
the data from several of the monitored metrics 
to calculate the length of time to a potential 
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future failure; thes long-term trend warnings 
and other warnings based on shorter-term data 
provide a degree of ^^predictive'' warning of 
potential problems that was previously absent. 

(e) collects information €Ut>out the normal state 
of the system as reported by several metrics; 
this information can be used by a future 
release of the service to improve the 
monitoring policy on the fly. 

The monitoring service uses a set of interfaces 
implemented in the Unisys HealthEvents COM object to 
capture information €d3out health violations and predicted 
violations on the system. The HealthEvents object 
maintains the collection of violation events and exports 
a scriptable interface that can be used to retrieve one 
or more of the events for display or further processing 
by some other application running on the system. In this 
particular implementation, the Server Director retrieves 
and processes the saved events. 

The final piece of the solution involves two 
new knowledge scripts (712) that run in the Server 
Director (an application in the Unisys Server 702) . The 
scripts provide notification of health events as they are 
detected in an environment and a form that is already 
familiar to existing Server Sentinel customers. These 
scripts retrieve the server (non-predictive) alerts and 
predictive alerts from the event collections maintained 
by the HealthEvents object and flash the associated icon 
in the Server Director tree view to direct the end user 
to the site of the problem. (Predictive alerts will flash 
both the icon for th aff cted system component and the 
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^^Predictive Alerts'' icon, serv r alerts will £lash on the 
icon of the affected component; see Fig. 8.) Server 
Director also provides a rich set of additional 
corrective and notification actions that ceui be 
5 associated with either or both scripts, as the user 
desires. 
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DETAILED DESCRIPTION: 

The general purpose of the software methodology 
described herein is to monitor the health of multi- 
processor systems, and appropriately respond to a current 
5 or potential system health problem, thereby putting a 
package of data describing the event into a scriptable 
data collection that is accessed by a script running in a 
separate process of the system, called Server Director. 
This Server Director monitors multiple systems using a 
10 script-driven approach and has a GUI (Graphical User 
Interface) that displays information about the systems it 
monitors. 

FIG. 7 illustrates a generalized block diagram 
in which the method of the present invention could be 

15 used. A Microsoft Windows .NET operating system exists 
(700) , which communicates data to €uid from a Unisys 
server 702. A series of processes are included in module 
720, which include a HealthEvents.dll file (706), which 
communicates with a data store 708. The data store 708 

20 contains the PredictiveEvents and HealthEvents 
collections 710. The data in these collections is 
accessed through the HealthEvents.dll (706) hosted by the 
Microsoft Windows .NET operating system 700. This series 
of process and data (720) receive their input from the 

25 HealthMonitor service 704. A user client application or 
script 712, also hosted by the Microsoft Windows .NET 
operating system 700, maintains communication with the 
set of processes 720 as well. 

FIG. 1 is a flowchart that illustrates the 

30 process flow of the HealthMonitor program. The 
executable file for the HealthMonitor starts 
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automatically one the system has been started (Block 
1000) • Predict IveEvents and HealthEvents collections are 
created at step 1002 . It should be noted that these 
events are defined in the HealthEvents.dll file 706. The 
5 independent ProviderCheck process is then initiated 
(Block 1004), which is described further in FIO. 4. The 
independent PolicyCheck process is initiated next (block 
1006) • The process is illustrated in further detail in 
FIG. 5. Next, the Unisys Engineering- supplied monitoring 

10 policies that are applicable to this system are started 
(Block 1008). This step is discussed further in FIG. 2. 
Next, a state in which service termination is waiting is 
initiated at step 1010. 

FIG. 2 is a flowchart that illustrates the 

15 start of Heal thMoni tor policies. The process begins by 
looking at each policy ^^P'' defined in a Unisys -supplied 
policy file (Block 2000) • An inquiry is made at step 
2006 to check is P is enabled. If P is not enabled (NO 
to inquiry 2006) , the process returns to check the 

20 remaining policies P defined in the Unisys-Supplied 
Policy file back at step 2000. If the answer to inquiry 
2006 is YES, and P is enabled, for each of these policies 
defined, the attributes for P are read from the file to 
determine what to monitor, how often, and what action, if 

25 any, to take when the policy is violated (Block 2002) • 
For example, one policy monitors the CPU utilization of a 
processor every 2 0 seconds and writes a warning message 
to the event log if the utilization exceeds 90%. 
Monitoring on P then begins on a separate processing 

30 thread T (Block 2004) initiated by th HealthMonitor 
service. This process can be seen in mor d tail in FIO. 
3. Th process then continues to loop back to step 2000, 

awk\appl\03-016 .doc 



22 

where the remaining policies P d £in d in the policy £il 
are checked to see if they should be deployed on this 
system. 

FIG. 3 is a flowchart which illustrates the 
5 process for monitoring HealthMonitor policy P on a 
separate thread T. The process begins by checking the 
data item(s) specified for this policy at step 3000. An 
inquiry is then made (Di€unond 3002) to check if P 
monitors for a long-term trend. If the answer to inquiry 

10 3002 is Yes, and P does monitor for a long-term trend, 
the current data point is added to the saved trend data 
at step 3004 and the process continues to step 3006. If 
the answer to inquiry 3002 is No, and P does not monitor 
for a long-term trend, the process continues to make 

15 another inquiry at step 3006. 

Inquiry 3006 checks to see if P is violated. If 
the answer to inquiry 3006 is No, and P is not violated, 
the process continues at inquiry 3016 to check if the 
service is stopping. If the answer to inquiry 3016 is 

20 Yes, and the service is indeed stopping, then T 
terminates itself and the process exits at step 3018. If 
the answer to inquiry 3016 is No, and the HealthMonitor 
service is not stopping, a process to wait for the ^^time 
interval" specified for this policy is initiated (Block 

25 3020) . The process then loops back to step 3000 to check 
the specified policy, and continues to go through the 
process again. 

If the answer to inquiry 3006 is Yes, and P has 
been violated, a violation event is created and added to 

30 HealthEvents collection (Block 3008) • For example, the 
policy that monitors CPU utilization will be violated if 
CPU usage xceeds 90% for a reasonable p riod of time; in 
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this case, a violation event is added to the collection 
for this CPU usage violation. 

An inquiry is then made to check if P monitors 
predictive data (Diamond 3010) . If the answer to inquiry 
5 3010 is No, and P does not monitor predictive data, the 
action specified for that policy, if any, is taken at 
step 3014. If the emswer to inquiry 3010 is Yes, and P 
does monitor predictive data, the violation event is 
added to the PredictiveEvents collection (Block 3012) • 

10 Next, the action specified for this policy, if any, is 
taken (Block 3014) . For exeunple, a policy that monitors 
the status of a critical system service may respond to 
finding the service not running by attempting to restart 
it. An inquiry is then made at step 3016 to check if the 

15 Heal thMoni tor service is stopping. If the answer to 
inquiry 3016 is Yes, and the service is indeed stopping, 
the thread T terminates itself and the process exits at 
step 3018. If the answer to inquiry 3016 is No, and the 
service is not stopping, a process to wait for the time 

20 interval specified for this policy is initiated (Block 
3020) • The process then loops back to step 3000 to check 
the specified policy, and continues to go through the 
process again. 

FIG. 4 is a flowchart that illustrates the 

25 process of the Heal thMoni tor ProviderCheck loop. First, 
a timer is started to ruji for 24 hours (Block 4000) , and 
then for each provider X defined in the Unisys -supplied 
monitoring policies (Block 4002) , an inquiry is made 
(Diamond 4004) . Inquiry 4004 checks to see if provider X 

30 is currently available. If th provid r X is not 
available (No) , Provid r X is disabled (Block 4006) , and 
th process continu s at Inquiry 4010. If provid r X is 
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available (Y s) , then another inqpiiry is made at st p 
4007 to check if provider X is already enabled. If 
provider X is already eneUbled, then the process continues 
at Inquiry 4010. If the answer to inquiry 4007 is No, 
5 and provider X is not already enabled, provider X will be 
enabled at step 4008. Separate processing is started for 
Thread T for provider X at step 4009. Inquiry 4010 then 
checks if provider X is the last provider defined in the 
monitoring policies. If there are no more providers to 

10 check, then the timer is set to start for 24 hours again 
at step 4000, and the process continues; otherwise the 
process returns to Block 4002 to check for the next 
provider. The loop described by the steps in FIG. 4 
allows the HealthMonitor service to detect changes in 

15 provider availability so that only those monitoring 
policies whose providers are enabled will be run. If a 
provider is added after the HealthMonitor service starts, 
the policies for that provider will become active the 
next time this loop executes; likewise, policies that 

20 have been rendered inapplicable because their provider 
has become unavailable will be disabled at the next loop. 
The loop does not terminate until the HealthMonitor 
service is terminated. 

FIG. 5 illustrates the HealthMonitor 

25 PolicyCheck loop. First, the timer is set to start and 
run for 24 hours at step 5000. Next, for each policy Y 
defined in the Unisys -supplied monitoring policies (Block 
5002), a set of steps is initiated. An inquiry is made 
at step 5004 to check if the policy Y provider is 

30 currently enabl d. If the answer to inquiry 5004 is No, 
and the policy Y provider is not nabled, policy Y is 
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disabled at step 5010, and then the process continues at 
Inquiry 5011. 

If the answer to inquiry 5004 is Yes, and the 
provider is currently enabled, then another inquiry is 
5 made at step 5006. Inquiry 5006 checks to see i£ the 
policy Y is applicable to the local system where the 
HealthMonitor service is running. If the answer to 
inquiry 5006 is No, policy Y is disabled, and the process 
continues at step 5011. If the answer to inquiry 5006 is 

10 Yes, policy Y is enabled at step 5008, and a process to 
start monitoring policy Y on its provider-processing 
Thread T is initiated at step 5009. Step 5009 is 
described in further detail in Fig. 3. Inquiry 5011 then 
checks if policy Y is the last policy defined in the 

15 monitoring policies. If there are no more policies to 
check, then the process continues back to step 5000 to 
start the timer to run for 24 hours; otherwise the 
process returns to Block 5002 to check the next policy. 
The loop described by the steps in FIG. 5 allows the 

20 HealthMonitor service to detect ch€uiges in the system 
configuration so that only those policies that monitor an 
element present on the system will be run. If additional 
hardware or software is installed after the HealthMonitor 
service starts, any policies that monitor the new 

25 hardware or software will become active the next time 
this loop executes; likewise, policies that have been 
rendered inapplic€d>le, because the items they monitor 
have been xininstalled, will be disabled at the next loop. 
The loop does not terminate until the HealthMonitor 

30 service is terminat d. 

FIG. 6 is a flowchart that illustrates the 
HealthEv nts.dll process flow. First, th 
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HealthEvent8.dll (dynamic link library) is load d upon 
demand £rom the HealthMonitor Service or a script client 
at step 6000. The client then creates a HealthEvents or 
PredictiveEvents object (Block 6002) • An inquiry is then 
5 made at step 6004 to check i£ the HealthMonitor service 
is active. I£ the answer to inquiry 6004 is No, and the 
HealthMonitor service is not active, an error is emitted 
and the process exits at step 6006. 

If the answer to inquiry 6004 is Yes, and the 

10 HealthMonitor service is indeed active, a process is 
initiated to connect the client's local object 
HealthEvents or PredictiveEvents object to the global 
HealthEvents or PredictiveEvents collection maintained by 
the HealthMonitor Service (Block 6008) • Next, ainother 

15 inquiry is made (Diamond 6010) to check again i£ the 
HealthMonitor service is active. I£ the answer to 
inquiry 6010 is No, and the HealthMonitor service is not 
active, an error is emitted and the process exits at step 
6012. I£ the answer to inquiry 6010 is Yes, and the 

20 HealthMonitor service is indeed active, a process is 
initiated to return information €J30ut the collections as 
requested by the client (number o£ items in collection, 
iterator to select the items one at a time) at step 6014. 
Once again, another inquiry to check i£ HealthMonitor 

25 service is active is initiated at step 6016. 1£ the 
emswer to inquiry 6016 is No, and the HealthMonitor 
service is not active, an error is emitted and the 
process exits at step 6018. If the answer to inquiry 
6016 is Yes, €uid the HealthMonitor service is indeed 

30 active, a proc ss is initiated to return information 
about the individual violation events as requested by the 
client at st p 6020. The information that can be 
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retrieved includes th s verity o£ the violation (error, 
warning, information) , the target icon to flash in the 
Server Director tree view i£ applicable, and a short 
headline €uid longer detailed message describing the 
5 violation. For instance, a CPU usage violation event 
would have a severity o£ ^^error'', a target o£ ^^CPU <x>" 
(where <x> is the number of the affected CPU) , a short 
message of ^^Utilization for Processor <ic>", and a 
detailed message of ^^Processor usage for CPU <x> exceeds 

10 threshold of 90%, current value is <y>V (where <y> will 
be greater than 90) • The client then deletes the local 
HealthEvents or PredictiveEvents object at step 6022. 

Described herein has been a monitoring method 
for a multiprocessor system which provides an exemplary 

15 service to, for example, the .NET enviroxment. This 
service can detect any violation of Policy-set conditions 
and can use scripts to provide corrective measures. This 
service operates to automatically start a ^^health- 
monitoring policy", operate to monitor conditions in the 

20 system configuration, and to provide system health trend 
analysis. 

Though one embodiment of the invention has been 
illustrated, other embodiments may be implemented which 
still utilize the essence of the invention as defined in 
25 the attached claims. 
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